Thematic Representation of Short Text Messages with Latent Topics: Application in the Twitter context

نویسندگان

  • Mohamed Morchid
  • Richard Dufour
  • Georges Linarès
چکیده

The amount of information exchanged over the Internet is continuously growing, taking the form of short text messages on microblogging platforms such as Twitter. Due to the limited size of these types of messages, their understanding may require to know the context of their occurrence. In this paper, we propose a higher-level representation of short text messages based on a thematic model obtained by a Latent Dirichlet Allocation (LDA). We propose to evaluate the effectiveness of this short text message representation by using it in the experimental setup of the INEX 2012 tweet contextualization task. This topic-based representation allows to extend the message vocabulary by searching a set of thematicallyrelated words. Results demonstrated the interest of this topicspace based approach for the tweet contextualization task. Keywords-Short text message, Thematic representation, Latent Dirichlet Allocation, Keyword extraction, Twitter

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Short Text Classification Improved by Learning Multi-Granularity Topics

Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Situation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia

Increasing migration is a vital concern for a globalizing sociocultural environment in today’s world. The UK and developed European countries have become an attractive destination for asylum seekers (labelled as “migrants”) in the past decade. The rapid rise in the number of asylum seekers, which was labelled “migration crisis” (Ruz, 2015), made this topic an integral part of scientific discuss...

متن کامل

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day. This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment. Using LIWC text analysis software, we conducted a conte...

متن کامل

Incorporating Tweet Relationships into Topic Derivation

With its rapid users growth, Twitter has become an essential source of information about what events are happening in the world. It is critical to have the ability to derive the topics from Twitter messages (tweets), that is, to determine and characterize the main topics of the Twitter messages (tweets). However, tweets are very short in nature and therefore the frequency of term co-occurrences...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013